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Abstract 

We describe a computational framework for a grammar architecture in 
which different linguistic domains such as morphology, syntax, and seman- 
tics are treated not as separate components but compositional domains. 
Word and phrase formation are modeled as uniform processes contribut- 
ing to the derivation of the semantic form. The morpheme, as well as 
the lexeme, has lexical representation in the form of semantic content, 
tactical constraints, and phonological realization. The model is based on 
Combinatory Categorial Grammars. 

1 Introduction 

The division of morphology and syntax in agglutinative languages is difficult 
compared to relatively more isolating languages. For instance, in Turkish, there 
is a significant amount of interaction between morphology and syntax. Typical 
examples are: causative suffixes change the valence of the verb, and the recipro- 
cal suffix subcategorize the verb for a noun phrase marked with the comitative 
case. Moreover, the head that a bound morpheme modifies may be not its stem 
but a compound head crossing over the word boundaries, e.g., 

(1) iyi oku-mu§ gocuk 
well read-REL child 
'well-educated child' 

In (|1|), the relat ive suffix -'mu§ (in past form of subject participle) modifies 
[iyi oku] to give the scope [[[iyi oku]mu§\ gocuk]. If syntactic composition is 
performed after morphological composition, we would get compositions such as 
[iyi [okumu§ gocuk]] or [[iyi oku'mu§] gocuk], which yield ill-formed semantics for 
this utterance. 

As pointed out by Oehrle [^, |6|, there is no reason to assume a layered 
grammatical architecture which has linguistic division of labor into compo- 
nents acting on one domain at a time. As a computational counterpart of 
this idea, rather than treating morphology, syntax and semantics in a cascaded 
manner, we integrate the process models of morphology and syntax, providing 

^This research is supported in part by grants from Scientific and Technical Research Council 
of Turkey (contract no. EEEAG-90) , NATO Science for Stability Programme (contract name 
TU-LANGUAGE), and METU Graduate School of Applied Sciences. 
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Figure 1: Scope ambiguity of a nominal bound morpheme 

semantic composition in parallel. The model, which is based on Combinatory 
Categorial Grammars (CCG) P], uses the morpheme as the building block 
of composition at all three linguistic domains. 

2 Morpheme-based Compositions 

When the morpheme is given the same status as the lexeme in terms of its 
lexical, syntactic, and semantic contribution, the distinction between the pro- 
cess models of morphotactics and syntax disappears. In this case, new scoping 
problems arise in word and phrase formation. 

CG accounts of scoping problems concentrate on syntactic and semantic is- 
sues such as quantifier scoping [P, 0]. In word formation, morphological brack- 
eting paradoxes are introduced by lexicalized composite affixes which require 
mixed compositions Q. However, the scoping problems in morphosyntax go be- 
yond bracketing paradoxes as they may also produce different semantic forms. 
Consider the example in (|2|): 

(2) uzun kol-lu gomlek 
long sleeve- AD J shirt 

Two different compositions^ in CCG formalism are given in Figure |. Both 
interpretations are plausible, with (la) being the most likely in the absence of 
a long pause after the first adjective. To account for both cases, the suffix -lu 

^derived and basic categories in the examples are in fact feature structures; see section ^. 
We use ^— ^ to denote the combination of categories x and y giving the result z 
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Figure 2: Composition with a verbal bound morpheme 

must be allowed to modify the head it is attached to (e.g., lb in Figure [l|), or 
a compound head encompassing the word boundaries (e.g., la in Figure [l|). 

Example (^) shows a composition with a verbal head. Figure |^ depicts the 
CCG treatment of this example. The verb konu§ does not subcategorize for a 
dative noun phrase (cf. example ^b); kadma is the argument of don. In this 
case, the adverbial suffix -erek must modify [kadma don] to obtain the correct 
reading. 

(3) a. kadm-a don-erek konu§-tu 

woman-DATIVE turn-ADV talk-TENSE 
'Facing the lady, (he/she) talked.' 

b. * kadma konu§tu 



3 Mult i- domain Combination Operator 

Oehrle |^] describes a model of multi-dimensional composition in which every 
domain Di has an algebra with a finite set of primitive operations Fi. As 
indicated by Turkish data in sections |^ and ^, Fi may in fact have a domain 
larger than — but compatible with — Di. 

In order to perform morphological and syntactic compositions in a unified 
(monostratal) framework, the slash operators of categorial grammar must be 
enriched with the knowledge about the type of process and the type of mor- 
pheme. We adopt a representation similar to Hoeksema and Janda's notation 
for the operator. The 3-tuple {direction, morpheme type, process type) indicates 
direction^ (left, right, unspecified), morpheme type (free, bound), and the type 

^ we have not yet incorporated into our model the word-order variation in syntax. See 
for a CCG based approach to this phenomenon. 



of morphological or syntactic attachment (e.g., affixation, syntactic concatena- 
tion, reduplication^, clitic). Examples of different operator combinations are as 
follows: 



Operator Morpheme Example 

< \, bound, clitio de 



< \, bound, afRx> -de 



< /, bound, redup> ap- 



< /, free, concat> uzun 



< \, free, concat> 



<|, free, concat> oku 



Ben de yaz-ar-im 
I too write-TENSE-PERS 
'I write too.' 



Ben-de kalem var 

I-LOCATIVE pen exist 
'I have a pen.' 

ap-agik durum 
INT-clear situation 
'Very clear situation' 

uzun yol 
long road 
'long road' 

bu-ndan ba§ka 
this-ABLATIVE other 
'other than this' 

adam kitab-i oku-du 

man book-ACC read-TENSE 
or 

adam okudu kitabi 

'The man read the book' 



4 Information Structure and Tactical Constraints 

Entries in the categorial lexicon have tactical constraints, grammatical and 
semantic features, and phonological representation. Similar to HPSG [^], every 
entry is a signed attribute-value matrix. 

Syntactic and semantic information are of grammatical {g) sign and seman- 
tic (s) sign, respectively. These properties include agreement features such as 

^ intensifiers such as ap- and hes- in ap-agik and hes-helli may appear as prefixes but they 
are in fact reduplicated from the first syllable of the stem 



person, number, and possessive, and selectional restrictions: 
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Basic and derived categories of CG are of p (property) or / (function) sign, 
respectively. 
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RES- OP- ARC is the categorial notation for the element. Every res and ARC 
feature has an / or p sign. 

Lexical and phrasal elements have functional representation (/ or p sign) and 
the PHON feature, phon represents the phonological string. Lexical elements 
may have (a) phonemes, (b) meta-phonemes such as H for high vowel, and D for 
a dental stop whose voicing is not yet determined, and (c) optional segments, 
e.g., -(y)lA, to model vowel/consonant drops, in the phon feature. During 
composition, the surface forms of composed elements are mapped and saved in 
PHON. PHON also allows efficient lexicon search. For instance, the causative 
suffix -DHr has eight different realizations but only one lexical entry. 

A special feature value called none is used for imposing certain morphotactic 
constraints. For instance, most of the inflectional morphemes of Turkish have 
the category X\X where X is the category of the stem, none is used to make 
sure that the stem is not inflected with the same feature more than once; it 
also ensures, through syn constraints, that inflections are marked in the right 
order. A sample lexicon entry for a derivational suffix is given in Figure |3|. 
For composition, we use a generalized LR parser in which CCG rules are 
encoded as recursive rewrite rules with equational constraints. 



5 Conclusion 



Turkish is a language in which grammatical functions can be marked morpho- 
logically (e.g., case), or syntactically (e.g., indirect objects). Semantic composi- 
tion is also affected by the interplay of morphology and syntax, for instance the 
change in the scope of modifiers and genitive suffixes, or valency and thematic 
role change in causatives. To model interactions between domains, we propose 
a categorial approach in which composition in all domains proceed in parallel. 
In the domain of phonology, there are categorial accounts of prosody and 
voice assimilation [^]. Our treatment of phonology is not yet integrated into 
the uniform grammar architecture. Morphophonemic processes such as vowel 
harmony and devoicing are modeled as mappings from the operator and the 
phonological strings to surface forms. Integrating categorial phonology into the 
architecture will help restore the modularity of processing at all domains. 
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Figure 3: Lexicon entry for -IH. 



